Fast XML Structural Join Algorithms by Partitioning

نویسندگان

  • Nan Tang
  • Jeffrey Xu Yu
  • Kam-Fai Wong
  • Jianxin Li
چکیده

An XML structural join evaluates structural relationships (e.g. parent-child or ancestordescendant) between XML elements. It serves as an important computation unit in XML pattern matching. Several classical structural join algorithms have been proposed such as Stack-tree join and XR-Tree join. In this paper, we consider to answer the problem of structural join by partitioning. The Dietz numbering scheme is used for encoding since nodes with the Dietz encodings could be well distributed on a plane. We first extend the relationships between nodes to the relationships between partitions on a plane and obtain some observations and properties about the relationships between partitions. We then propose a new partition-based method, named P-Join for structural join between ancestor and descendant nodes based on the properties derived from our observations. Moreover, we present an enhanced partitioned-based structural join algorithm and two optimized methods. Extensive experiments show that the performance of our proposed algorithms outperform that of Stack-tree and XR-Tree algorithms. In order to store the partitioning results, we design a simple but efficient index structure, called PSS-tree. The experimental result shows that it has less maintenance overhead than XR-Tree.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating XML Structural Join by Partitioning

Structural join is the core part of XML queries and has a significant impact on the performance of XML queries, several classical structural join algorithms have been proposed such as Stack-tree join and XR-Tree join. In this paper, we consider to answer the problem of structural join by partitioning. We first extend the relationships between nodes to the relationships between partitions in the...

متن کامل

Structural Joins: a Primitive for Eecient Xml Query Pattern Matching

XML queries typically specify patterns of selection predicates on multiple elements that have some speciied tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and nding all occurrences of these structural relationships in an XML database is a core operation for XML query processing. In this paper, we develop two families of struc...

متن کامل

Fast and Tiny Structural Self-Indexes for XML

XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is co...

متن کامل

Labeling Scheme and Structural Joins for Graph-Structured XML Data

When XML documents are modeled as graphs, many challenging research issues arise. In particular, query processing for graphstructured XML data brings new challenges because traditional structural join methods cannot be directly applied. In this paper, we propose a labeling scheme for graph-structured XML data. With this labeling scheme, the reachability relationship of two nodes can be judged e...

متن کامل

Amoeba Join: Overcoming Structural Fluctuations in XML Data

There are no universal rules for organizing data in XML. Consequently, semantically identical XML documents may have different structures; we call this structural fluctuation in XML. Finding all the structural fluctuations in an XML document requires verbose path expression queries. To overcome this problem, we developed a novel query processing primitive, called amoeba join. Amoeba join does n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Research and Practice in Information Technology

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2008